Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unstack in master expects sorted groups #1

Open
wants to merge 23 commits into
base: enhance_join
Choose a base branch
from

Conversation

gustafsson
Copy link

This branch breaks the unit tests when merged to master on: https://github.com/JuliaStats/DataFrames.jl/blob/master/test/dataframe.jl

This commit preserves the expected behaviour of unstack to sort groups. It's probably a good idea to not sort groups, but if so that test needs to be updated as well.

alyst and others added 23 commits November 13, 2015 22:22
use colordering(DFPerm) and getindex(DFPerm) to squeeze multiple lt()
methods into one
- don't encode the indexing columns, use DataFrameRow hashes instead
- do only the parts of left-right rows matching that are required for a
  particular join kind
- avoid vcat() that is very slow for PooledDataVector
- now join respects left-frame order for all join kinds, so the
  tests/data.jl test were updated
sorting order is changed from NA first to NA last (it matches
the default data frame sorting)
so that DataFrameRow object doesn't need to be created
instead of using Dict{DataFrameRow,Int}, implement its own that
 - doesn't require DataFrameRow objects
 - calculates hashes in memory-efficient manner
 - keeps row hashes for efficient comparison

use it for join(), groupby(), nonunique()

disable DataFrameRowTests that use _RowGroupDict methods no longer
available
it's not used and it is no loner faster than nonunique()
by default no sorting is applied to preserve original ordering
(the initial order of the 1st rows is preserved) and make things faster
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants